Attacking LLMs

Learn to identify and exploit LLM vulnerabilities, covering prompt injection, insecure output handling, and model poisoning.

In this module, we cover practical attacks against systems that use large language models, including prompt injection, unsafe output handling, and model poisoning . You will learn how crafted inputs and careless handling of model output can expose secrets or trigger unauthorised actions, and how poisoned training data can cause persistent failures. Each topic includes hands-on exercises and realistic scenarios that show how small issues can be linked into larger attack paths. By the end, participants can build concise proof of concept attacks and suggest clear, practical mitigations.

BankGPT

A customer service assistant used by a banking system.

HealthGPT

A safety-compliant AI assistant that has strict rules against revealing sensitive internal data.

Attacking LLMs

Input Manipulation & Prompt Injection

LLM Output Handling and Privacy Risks

Data Integrity & Model Poisoning

Juicy

BankGPT

HealthGPT

Input Manipulation & Prompt Injection​

LLM Output Handling and Privacy Risks​

Data Integrity & Model Poisoning​

Juicy​

BankGPT​

HealthGPT​

Input Manipulation & Prompt Injection

LLM Output Handling and Privacy Risks

Data Integrity & Model Poisoning

Juicy

BankGPT

HealthGPT